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Machine Learning 


Advice for applying 
machine learning 

Deciding what to 
try next 



Debugging a learning algorithm: 

Suppose you have implemented regularized linear regression to predict 
housing prices. 
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However, when you test your hypothesis on a new set of houses, you find that it 
makes unacceptably large errors in its predictions. What should you try next? 

- Get more training examples 

- Try smaller sets of features 

- Try getting additional features 

- Try adding polynomial features (xf, x\, x 1X2, etc.) 

- Try decreasing A 

- Try increasing A 


Machine learning diagnostic: 

Diagnostic: A test that you can run to gain insight what 
is/isn’t working with a learning algorithm, and gain 
guidance as to how best to improve its performance. 

Diagnostics can take time to implement, but doing so 
can be a very good use of your time. 
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Evaluating your hypothesis 



h$(x) — 6 q + 0\x + O 2 X 2 
+ 9^x 3 + 64X 4 


Fails to generalize to new 
examples not in the training set. 

x\ = size of house 
x 2 = no. of bedrooms 

x$ = no. of floors 
£4 = age of house 
xfj = average income in 
x 6 = neighborhood 


£100 kitchen size 


valuating your hypothesis 

Dataset: 
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Training/testing procedure for linear regression 

- Learn parameter 9 from training data (minimizing 
training error J{0)) 


- Compute test set error: 


Training/testing procedure for logistic regression 

- Learn parameter 9 from training data 

- Compute test set error: 

TYltest 

Jtest(O) = -^7 Y Vtest^heiXtest) + i 1 ~ Vtest ) lo g h 0 ( X iest ) 

i = 1 

- Misclassification error (o/i misclassification error): 
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Overfitting example 

Once parameters 0 o , 0 1 , . . . , 0 4 
were fit to some set of data 
(training set), the error of the 
parameters as measured on that 
data (the training error J{6) ) is 
size likely to be lower than the actual 
h e (x) = 9 o + 0 1 x + 0 2 x 2 generalization error. 

+ 9^X 3 + 04 X 4 



odel selection How to choose d= the degree of polynomial 

1. ho(x) = 9o + Oix 

2. Hq{x) = Oq + 6\X + 02X^ 

3* h$(x) = 0 q -\- 6\x + • • • + 

10. hg(x) — $0 “H 0\X • • • “l - e 10 x 10 

Choose 0 O + . . . 0 5 x 5 

How well does the model generalize? Report test set error Jtesti^) ■ 

Problem: Jtest{Q^) is likely to be an optimistic estimate of 
generalization error. I.e. our extra parameter ( d = degree of 
polynomial) is fit to test set. 


Evaluating your hypothesis 

Dataset: 
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Train/validation/test error 

Training error: 

° m 

= 5k Z(M* (i> ) - y {i) ? 
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Cross Validation error: 
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Test error: 

Jtest(&) 
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odel selection 

ho(x) = 6o + Q\x 

Hq (x) — 6q + + O^x^ 

Kq{x ) = "i" + * * * ~\~ d^x^ 

lO. 0$(x) — 0o ”1” ' ' ' ~\~ 01O* 10 


Pick + 01*^1 “1“ * * * “l - 04 

With the lowest validation error... then the cross validation 
data is used estimate d (Model selection) 

Estimate generalization error for test set Jtesti 0 ^) 
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Diagnosing bias 


vs. variance 
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Bias/variance 
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High bias 
(underfit) 


“Just right” 


High variance 
(over fit) 


Price 


Training error :J troin (6>) = ^ ^(M® (<) ) - y (i) f 

i=1 m 

I / l cv 

Cross validation error: j cv{ 6) = - v® 

1=1 


o 
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Size 


degree of polynomial d 


error 


lagnosing bias vs. variance 

Suppose your learning algorithm is performing less well than 

you were hoping. ( J cv {6) or Jtest(&) is high.) Is it a bias 
problem or a variance problem? 

Bias (underfit) : 



Variance (overfit) 
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machine learning 

Regularization and 
bias/variance 
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inear regression with regularization 

Model: hg(x) — 9 q + 0 \x + 62X 2 + O3X 3 + 6^x A 
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Size 

Large A 

High bias (underfit) 


Size 

IntermediateA 
“Just right” 


Size 

Small A 

High variance (overfit) 


A = 10000 . 6\ « 0 , 62 ~ 0 , . . . 
ho(x) « 6> 0 


Choosing the regularization parameter A 

h$(x) — Oq “l - 0\X O 2 X 2 O^x^ O^x^ 
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Choosing the regularization parameter A 

Model: hg(x) = 0 o + 0 4 x + 02 X 2 + 0 3 x 3 + 0 4 x A 

777/ TTl 

i = 1 j = 1 

1. Try A = 0 

2. Try A = 0.01 

3. Try A = 0.02 

4 . Try A = 0.04 

5 . Try A = 0.08 

12. Try A = 10 


Pick (say) 0 {3) . That minimize Test error: J cv (0) 


Bias/variance as a function of the regularization parameter A 
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Learning curves 
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hg {pc) = 0q + 0i# + 02 ^ 2 






High bias 
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171 (training set size) 

If a learning algorithm is 
suffering from high bias, getting 
more training data will not (by 
itself) help much. 
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High variance 
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If a learning algorithm is 
suffering from high variance, 
getting more training data is 
likely to help. 
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Advice for applying 
machine learning 

Deciding what 
to try next 



Debugging a learning algorithm: 

Suppose you have implemented regularized linear regression to 
predict housing prices. However, when you test your hypothesis in a 
new set of houses, you find that it makes unacceptably large errors 
in its prediction. What should you try next? 

- Get more training examples 

- Try smaller sets of features 

- Try getting additional features 

- Try adding polynomial features (x\, x\, xix 2 ,etc) 

- Try decreasing a 

- Try increasing A 




